7 research outputs found
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
When training overparameterized deep networks for classification tasks, it
has been widely observed that the learned features exhibit a so-called "neural
collapse" phenomenon. More specifically, for the output features of the
penultimate layer, for each class the within-class features converge to their
means, and the means of different classes exhibit a certain tight frame
structure, which is also aligned with the last layer's classifier. As feature
normalization in the last layer becomes a common practice in modern
representation learning, in this work we theoretically justify the neural
collapse phenomenon for normalized features. Based on an unconstrained feature
model, we simplify the empirical loss function in a multi-class classification
task into a nonconvex optimization problem over the Riemannian manifold by
constraining all features and classifiers over the sphere. In this context, we
analyze the nonconvex landscape of the Riemannian optimization problem over the
product of spheres, showing a benign global landscape in the sense that the
only global minimizers are the neural collapse solutions while all other
critical points are strict saddles with negative curvature. Experimental
results on practical deep networks corroborate our theory and demonstrate that
better representations can be learned faster via feature normalization.Comment: The first two authors contributed to this work equally; 38 pages, 13
figures. Accepted at NeurIPS'2
Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery
Modern deep neural networks (DNNs) are highly accurate on many recognition
tasks for overhead (e.g., satellite) imagery. However, visual domain shifts
(e.g., statistical changes due to geography, sensor, or atmospheric conditions)
remain a challenge, causing the accuracy of DNNs to degrade substantially and
unpredictably when testing on new sets of imagery. In this work, we model
domain shifts caused by variations in imaging hardware, lighting, and other
conditions as non-linear pixel-wise transformations, and we perform a
systematic study indicating that modern DNNs can become largely robust to these
types of transformations, if provided with appropriate training data
augmentation. In general, however, we do not know the transformation between
two sets of imagery. To overcome this, we propose a fast real-time unsupervised
training augmentation technique, termed randomized histogram matching (RHM). We
conduct experiments with two large benchmark datasets for building segmentation
and find that despite its simplicity, RHM consistently yields similar or
superior performance compared to state-of-the-art unsupervised domain
adaptation approaches, while being significantly simpler and more
computationally efficient. RHM also offers substantially better performance
than other comparably simple approaches that are widely used for overhead
imagery.Comment: Includes a main paper (10 pages). This paper is currently undergoing
peer revie
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks
Over the past few years, an extensively studied phenomenon in training deep
networks is the implicit bias of gradient descent towards parsimonious
solutions. In this work, we investigate this phenomenon by narrowing our focus
to deep linear networks. Through our analysis, we reveal a surprising "law of
parsimony" in the learning dynamics when the data possesses low-dimensional
structures. Specifically, we show that the evolution of gradient descent
starting from orthogonal initialization only affects a minimal portion of
singular vector spaces across all weight matrices. In other words, the learning
process happens only within a small invariant subspace of each weight matrix,
despite the fact that all weight parameters are updated throughout training.
This simplicity in learning dynamics could have significant implications for
both efficient training and a better understanding of deep networks. First, the
analysis enables us to considerably improve training efficiency by taking
advantage of the low-dimensional structure in learning dynamics. We can
construct smaller, equivalent deep linear networks without sacrificing the
benefits associated with the wider counterparts. Second, it allows us to better
understand deep representation learning by elucidating the linear progressive
separation and concentration of representations from shallow to deep layers. We
also conduct numerical experiments to support our theoretical results. The code
for our experiments can be found at https://github.com/cjyaras/lawofparsimony.Comment: The first two authors contributed to this work equally; 32 pages, 12
figure
Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery
Modern deep neural networks (DNNs) are highly accurate on many recognition tasks for overhead (e.g., satellite) imagery. However, visual domain shifts (e.g., statistical changes due to geography, sensor, or atmospheric conditions) remain a challenge, causing the accuracy of DNNs to degrade substantially and unpredictably when testing on new sets of imagery. In this work, we model domain shifts caused by variations in imaging hardware, lighting, and other conditions as nonlinear pixel-wise transformations, and we perform a systematic study indicating that modern DNNs can become largely robust to these types of transformations, if provided with appropriate training data augmentation. In general, however, we do not know the transformation between two sets of imagery. To overcome this, we propose a fast real-time unsupervised training augmentation technique, termed randomized histogram matching (RHM). We conduct experiments with two large benchmark datasets for building segmentation and find that despite its simplicity, RHM consistently yields similar or superior performance compared to state-of-the-art unsupervised domain adaptation approaches, while being significantly simpler and more computationally efficient. RHM also offers substantially better performance than other comparably simple approaches that are widely used for overhead imagery